Distant Supervision for Cancer Pathway Extraction from Text
نویسندگان
چکیده
Biological pathways are central to understanding complex diseases such as cancer. The majority of this knowledge is scattered in the vast and rapidly growing research literature. To automate knowledge extraction, machine learning approaches typically require annotated examples, which are expensive and time-consuming to acquire. Recently, there has been increasing interest in leveraging databases for distant supervision in knowledge extraction, but existing applications focus almost exclusively on newswire domains. In this paper, we present the first attempt to formulate the distant supervision problem for pathway extraction and apply a state-of-the-art method to extracting pathway interactions from PubMed abstracts. Experiments show that distant supervision can effectively compensate for the lack of annotation, attaining an accuracy approaching supervised results. From 22 million PubMed abstracts, we extracted 1.5 million pathway interactions at a precision of 25%. More than 10% of interactions are mentioned in the context of one or more cancer types, analysis of which yields interesting insights.
منابع مشابه
Prior-informed Distant Supervision for Temporal Evidence Classification
Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated...
متن کاملLiterature mining of protein-residue associations with graph rules learned through distant supervision
BACKGROUND We propose a method for automatic extraction of protein-specific residue mentions from the biomedical literature. The method searches text for mentions of amino acids at specific sequence positions and attempts to correctly associate each mention with a protein also named in the text. The methods presented in this work will enable improved protein functional site extraction from arti...
متن کاملModeling Relations and Their Mentions without Labeled Text
Several recent works on relation extraction have been applying the distant supervision paradigm: instead of relying on annotated text to learn how to predict relations, they employ existing knowledge bases (KBs) as source of supervision. Crucially, these approaches are trained based on the assumption that each sentence which mentions the two related entities is an expression of the given relati...
متن کاملGrounded Semantic Parsing for Complex Knowledge Extraction
Recently, there has been increasing interest in learning semantic parsers with indirect supervision, but existing work focuses almost exclusively on question answering. Separately, there have been active pursuits in leveraging databases for distant supervision in information extraction, yet such methods are often limited to binary relations and none can handle nested events. In this paper, we g...
متن کاملDistant Supervision for Relation Extraction Using Ontology Class Hierarchy-Based Features
Relation extraction is a key step in the problem of structuring natural language text. This paper demonstrates a multi-class classifier for relation extraction, constructed using the distant supervision approach, along with resources of the Semantic Web. In particular, the classifier uses a feature based on the class hierarchy of an ontology that, in conjunction with basic lexical features, imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره شماره
صفحات -
تاریخ انتشار 2015